Document fingerprinting for mobile phones

ABSTRACT

One embodiment relates to a method for providing a service which matches document fingerprints against a database of document fingerprints. Target text data on a mobile phone device is obtained, and target document fingerprints are generated for the target text data using a fingerprint generator on the mobile phone device. The target document fingerprints are transmitted to a service cloud. A feedback message is received from the service cloud. The feedback message depends on results from matching the target document fingerprints against the database of document fingerprints. Other embodiments, aspects and features are also disclosed.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to methods and apparatus fordocument or text string fingerprinting. The technology disclosed hereinis applicable for data leakage prevention, spam filtering, and otherapplications which may use document fingerprinting.

2. Description of the Background Art

One problem in the field of network security relates to data leakageprevention (DLP). DLP is needed to avoid loss of proprietaryinformation, intellectual property, and other sensitive data. To protectsensitive data, enterprises need an effective DLP solution whichmonitors potential information leaks at the point of use. However, theexplosion of messaging systems, wireless networking, and universalserial bus (USB) storage devices has made the protection of criticalenterprise data difficult. As a result, enterprises are experiencing anincrease in the loss and even theft of data assets by employees orcontractors or even hackers (and malwares) who maliciously oraccidentally leak data.

Another problem in the field of network security relates to unsolicitedmessages in e-mail systems. Such unsolicited messages, also referred toas “spam,” are mass mailed by spammers to e-mail accounts over theInternet. Various anti-spam software products have been developed tocombat spam.

It is highly desirable to improve technologies which facilitate documentor text string fingerprinting for data leakage prevention, spamfiltering, and other applications.

SUMMARY

One embodiment relates to a method for providing a service which matchesdocument fingerprints against a database of document fingerprints.Target text data on a mobile phone device is obtained, and targetdocument fingerprints are generated for the target text data using afingerprint generator on the mobile phone device. The target documentfingerprints are transmitted to a service cloud. A feedback message isreceived from the service cloud. The feedback message depends on resultsfrom matching the target document fingerprints against the database ofdocument fingerprints.

Another embodiment relates to a mobile phone device which includescommunication circuits configured to receive and send data by way of acellular telecommunications network. The mobile phone device alsoincludes a data storage system, including memory, configured to storecomputer-readable instruction code and data, and a processor configuredto access the data storage system and to execute the computer-readableinstruction code. Computer-readable instruction code in the mobile phonedevice is configured to generate target document fingerprints for targettext data using a fingerprint generator, transmit the target documentfingerprints to a service cloud, and receive a feedback message from theservice cloud. The feedback message depends on results from matching thetarget document fingerprints against the database of documentfingerprints.

Another embodiment relates to a computer apparatus including datastorage configured to store computer-readable instruction code and data,and a processor configured to access the data storage and to executesaid computer-readable instruction code. In addition, the computerapparatus includes computer-readable instruction code configured as anenterprise fingerprint agent for a cloud-based service. The computerapparatus further includes computer-readable instruction code configuredto generate document fingerprints using a higher-density fingerprintgeneration procedure which comprises: normalizing target text data tocreate a normalized text string; applying a first hash function with asliding hash window to the normalized text string to generate an arrayof hash values; applying a first filter to the array of hash values toselect candidate anchoring points; and applying a second hash functionto substrings located at the candidate anchoring points to generate thedocument fingerprints.

These and other embodiments and features of the present invention willbe readily apparent to persons of ordinary skill in the art upon readingthe entirety of this disclosure, which includes the accompanyingdrawings and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A is a high-level diagram of system for providing a data leakageprevention (DLP) service on smart phones in accordance with anembodiment of the invention.

FIG. 1B shows select operational elements within the system of FIG. 1Ain accordance with an embodiment of the invention.

FIG. 2 depicts a first method for generating document fingerprints inaccordance with an embodiment of the invention.

FIG. 3 illustrates a sliding hash window and anchoring points afterapplication of a first filter in accordance with an embodiment of theinvention.

FIG. 4 depicts a second method for generating document fingerprints inaccordance with an embodiment of the invention.

FIG. 5 depicts the division of a normalized textual string into a numberof pieces for a second filtering method in accordance with an embodimentof the invention.

FIG. 6 shows the selection of a single anchoring point from a firstpiece for the second filtering method in accordance with an embodimentof the invention.

FIG. 7 depicts select components of an example computer that may be usedin embodiments of the present invention.

FIG. 8 depicts select components of an example mobile device that may beused in embodiments of the present invention.

FIG. 9 depicts select operational elements of a system for providing aspam filtering service on smart phones in accordance with an embodimentof the invention.

DETAILED DESCRIPTION DLP Service with Document Fingerprinting

With the popularity of smart phones, increasing number of business usersare carrying sensitive business data in their smart phones that may ormay not belong to their employers. This creates a serious channel forpotential leakage of sensitive business data.

Unfortunately, conventional data leakage prevention (DLP) solutions donot work well with mobile phones. This is because conventional endpointDLP agents require downloading document fingerprints for the enterprisefrom backend servers. There are several difficulties due to thisrequirement.

First, the endpoints should be in the enterprise network for downloadingfingerprints. In the case of mobile phones, this is not usually the caseand is not a reasonable assumption. Second, the volume of fingerprintsneeding to be downloaded and stored on the mobile phone may besignificantly large so as to consume an inordinate amount of bandwidthand storage space on the phone. Third, the process of matchingfingerprints is generally substantially CPU intensive so as to consume asubstantial portion of power on the phone. As such, due to these threedifficulties, it is problematic to provide a DLP solution withfingerprint matching at mobile phones.

The present disclosure provides a DLP solution for mobile phones thatovercomes the above-discussed difficulties. First, the solution providedby the present disclosure does not require the mobile phone to be a partof the enterprise network. Second, the solution provided by the presentdisclosure does not require the downloading and storage of fingerprintson the mobile phone. Third, the solution provided by the presentdisclosure does not require fingerprint matching to be performed by themobile phone.

FIG. 1A is a high-level diagram of system 100 for providing a DLPservice on smart phones in accordance with an embodiment of theinvention. The system includes one or more enterprise networks 110, aservice cloud 120, and a plurality of smart phones 130. An enterprisenetwork 110 may be, for example, a wide area network for a largeenterprise and may include various network segments. As further shown,an enterprise network 110 may include an enterprise fingerprint agent112.

In relation to data leakage prevention for the enterprise network 110,various endpoint devices may be defined, including a plurality of smartphone devices 130 and a plurality of other endpoint devices 140. Theseendpoint devices are devices which are being monitored by the dataleakage prevention system in order to prevent leakage of sensitive data.

The smart phone devices 130 may include smart phones and tablets whichinclude telecommunications by way of a cellular phone network andInternet access. Various applications, such as a web browser, anelectronic mail client, and other applications may be executed on themobile operating system of the smart phone devices 130. For example, themobile operating system may be the iOS mobile operating system availablefrom Apple Inc. of Cupertino, Calif., or may include the Androidsoftware stack available from Google, Inc. of Mountain View, Calif., ormay be a different mobile operating system. As shown in FIG. 1A, thesmart phone devices 130 may be outside of the enterprise network 110. Inother words, the smart phone devices 130 need not necessarily be nodeson the enterprise network 110.

The other endpoint devices 140 may include desktop, laptop computers,and other devices with data communication capabilities and/or removabledata storage. As shown in FIG. 1A, the other endpoint devices 140 may bepart of the enterprise network 110. In other words, the other endpointdevices 140 are typically nodes on a segment of the enterprise network110.

The service cloud 120 may include various computing and networkresources which may work together so as to provide an applicationservice to the enterprise. In this case, the application service may bea DLP service. The DLP service may include DLP agents at endpointdevices to monitor outgoing communications and prevent leakage ofsensitive (protected) information. In accordance with an embodiment ofthe invention, smart phone devices 130 may include a fingerprint-lessmobile agent (FLMA) 132 to provide a bandwidth-efficient content-awareDLP solution for the smart phone devices 130. An embodiment of the FLMA132 is described further below in relation to FIG. 1B.

FIG. 1B shows select operational elements within the system 100 of FIG.1A in accordance with an embodiment of the invention. In particular,select operational elements utilized by the enterprise fingerprint agent(EFA) 112, the service cloud 120, and a smart phone 130 are shown.

Within the enterprise network 110, there is data file storage which mayinclude sensitive document files 111 as determined by a DLPadministrator for the enterprise. The sensitive document files 111include sensitive text data, such as, for example, confidential personalinformation of employees, trade secret information of the enterprise,and other sensitive information.

The EFA 112 may be embodied as a program code executing on a computersystem within the enterprise network 110. The EFA 112 may be configuredto perform various functions, including classifying sensitive documents,generating fingerprints based on text data in the sensitive documents111, and uploading the fingerprints to the service cloud 120.

In accordance with an embodiment of the invention, the fingerprints maybe generated by the EFA 112 using a higher-density fingerprint generator114. In one implementation, the higher-density fingerprint generator 114may be configured to utilize a single-stage fingerprint filteringtechnique which is described in further detail below. The sensitivedocument fingerprints for the enterprise may be stored in a localfingerprint database 116.

Within the service cloud 120, there may be a hosted fingerprint database122. The hosted fingerprint database 122 may be a copy of the localfingerprint database 116 and may be hosted on one or more data storagedevices within the service cloud 120. The hosted fingerprint database122 may be frequently updated with database updates 117 to keep thelocal and hosted databases closely synchronized. Fingerprint databaseupdates may be transmitted from the enterprise fingerprint agent 112 inthe enterprise network 110 to the service cloud 120.

The service cloud 120 may also include a fingerprint match engine 124and a data leakage prevention policy engine (DLP policy engine) 126. Thefingerprint match engine 124 and the DLP policy engine 126 may each beembodied as program code executing on one or more server computerswithin the service cloud 120.

The fingerprint match engine 124 may be configured to receivefingerprints 145 of the target text data 131 from the smart phone 130(or other endpoint devices 140). The fingerprint match engine 124applies a matching method to determine to what degree the fingerprints145 of the target text data 131 match, or do not match, one or morefingerprint in the hosted fingerprint database 122. The results of thematching may be communicated to the DLP policy engine 126.

The DLP policy engine 126 may be configured to receive local matchresults 147 from the local match engine 134 on the smart phone 130 (orother endpoint device 140). The DLP policy engine 126 may then applypolicies, which may be specific to a particular enterprise, to the matchresults. In other words, different enterprises may have differentpolicies to be applied by the DLP policy engine 126. Feedback 150 basedon the match results and applied policies may then be returned from theservice cloud 120 to the smart phone 130 (or other endpoint device 140)which had sent the local match results 147.

The service cloud 120 may provide a web-based console that an enterpriseDLP administrator may use to manage the DLP service by creating the DLPpolicies, compliance templates, digital assets such as keyworddictionaries and regular expression patterns, and so forth. Theenterprise DLP administrator may also use such a web-based console tomanage the hosted fingerprint database for the enterprise and togenerate DLP reports.

In accordance with an embodiment of the invention, the DLP policy engine126 is configured to receive results from the fingerprint match engine124 regarding whether or not the fingerprints received from a smartphone 130 (or other endpoint device 140) match one or more fingerprintin the hosted fingerprint database 122. The DLP policy engine 126 maythen apply policies, which may be specific to a particular enterprise,to the results. Feedback 150, based on the match results and the appliedpolicies, is then returned to the smart phone 130 (or other endpointdevice 140) which had sent the fingerprints 145 to be matched.

The smart phone device 130 (or other endpoint device 140) may includetarget text data 131 and the FLMA 132. The target text data 131 may be,for example, in a document file that is being exported from the device.For example, the target text data 131 may be in a file attached to anoutgoing electronic mail or text message, or the target text data may bein the body of the outgoing electronic mail or text message. Since thedevice is an endpoint of the DLP system, such an exportation of textdata is monitored by the DLP system and may be checked by fingerprintmatching against the hosted fingerprint database 122 used by the servicecloud 120.

The FLMA 132 may include program code configured to implement a localmatch engine 134 and a fingerprint generator. The local match engine 134may be configured to match non-fingerprint-related attributes such as,keywords, regex patterns, and file attributes. The matching techniquesutilized by the local match engine 134 may be less CPU intensive thanfingerprint matching. The local match result may be compressed and sentto the DLP policy engine 126 in the service cloud 120. The fingerprintgenerator is configured to generate digital “fingerprint” data(“document fingerprints” or simply “fingerprints”) of the target textdata 131. In addition, the FLMA 132 may include an action module 138which is configured to take actions such as providing an alert orblocking the data leak. The alert may comprise, for example, a shorttext message that may be sent to the phone user as a warning. The dataleak blocking may comprise, for example, a command to block an outgoingmessage from the phone.

In accordance with an embodiment of the invention, the fingerprintgenerator of the FLMA 132 may comprise a lower-density fingerprintgenerator 136. In implementation, the lower-density fingerprintgenerator 136 may be configured to utilize a multiple-stage (two or morestages) fingerprint filtering technique which is described in furtherdetail below. Due to the extra fingerprint filtering, the lower-densityfingerprint generator 136 of the smart phone 130 generates substantiallyfewer fingerprints for the same text data than the number generated bythe higher-density fingerprint generator 114 of the EFA 112. Thefingerprints per document may be very small in total size. For example,the size of the fingerprints per document may be sixty-four bytes perdocument and may include eight fingerprints of eight bytes each.

By generating fewer fingerprints 145, the lower-density fingerprintgenerator 130 utilizes less telecommunications bandwidth when it sendsthe fingerprints 145 from the smart phone 130 to the service cloud 120.This feature is particularly advantageous due to the limited bandwidthavailable for cell phone data communications. The smaller bandwidthrequirements due to the fewer fingerprints means that the cloud-basedDLP service (or other service using document fingerprint matching) maybe provided with less impact on cell phone performance.

In accordance with an embodiment of the invention, documentfingerprinting is provided for mobile phones using an asymmetricfingerprint generation system. In this case, the fingerprint generationis asymmetric in that a higher-density fingerprint generator is used onthe enterprise side for sensitive document fingerprinting, and alower-density fingerprint generator is used on the mobile device sidefor document variation detection. More particularly, in a specificimplementation, single-stage fingerprint filtering may be used for thehigher-density fingerprint generation, and multiple-stage fingerprintfiltering may be used for the lower-density fingerprint generation.

Document fingerprints are document attributes used to characterize someinvariants of a document. Document fingerprints should ideally have thefollowing characteristics: i) unrelated and irrelevant documents shouldnot share any fingerprint; and ii) a variant of a document is expectedto have some common fingerprints with its original version.

The present disclosure provides a DLP solution with documentfingerprinting that is sufficiently resource-efficient to be practicalfor use with a smart phone device. The disclosed DLP solution isadvantageously content-aware in that variants of a sensitive documentmay be detected with minimal false positives by utilizing documentfingerprinting technology. The solution is advantageously storageefficient at the smart phone in that no fingerprint database needs to bestored on the smart phone. In addition, the solution is advantageouslyprocessing and power efficient at the smart phone in that thefingerprint matching (which may be processing-intensive andpower-consuming) is not performed by the smart phone. Furthermore, thesolution is advantageously “bandwidth efficient” because thelower-density fingerprint generator may generate a small amount offingerprint data (for example, less than one hundred bytes) to send tothe service cloud for the matching service.

Higher-Density Fingerprint Generation

FIG. 2 depicts a first method 200 for generating document fingerprintsin accordance with an embodiment of the invention. This first method 200may be applied by the higher-density fingerprint generator 114 used byan EFA 112.

As shown in FIG. 1B, sensitive document files 111 may be stored in adata storage system in the enterprise network 110. Consider thegeneration of fingerprints for one file d which is deemed to includesensitive information to be protected by the DLP system.

Per the method 200 of FIG. 2, computer-readable program code may beconfigured to receive and normalize 202 the document d to generate anormalized document t. The document may be normalized 202 by a processthat may include the following three sub-processes: i) converting adocument in any format, such as Word, Excel, PPT, PDF and so forth, intoa plain text encoded in UTF-8; ii) translating any plain text in otherencodings into the plain text encoded in UTF-8; and iii) removing thenon-informative characters such as white spaces, control characters,delimiters and so forth. The normalized document t may be a textualstring encoded in UTF-8 after the document is normalized. A normalizeddocument may be viewed as either a UTF-8 string or a binary string. Inthis disclosure, we generally view a normalized document as a binarystring.

The normalized document t is then processed to generate 204 a set ofhash values h from t. In accordance with one embodiment of theinvention, a rolling hash function, denoted H1, with a pre-defined hashwindow size may be used to slide through and process the normalized textstring t in order to generate the set of hash values. An example of sucha sliding hash window is shown in FIG. 3. In FIG. 3, each block in thedepicted array represents a character of the normalized text string t.The rolling hash function H1 may generate one hash value h from eachposition of the sliding hash window. For example, if the text string isn characters long, and the hash window is w characters wide, then(n−w+1) hash values h may be generated. In a particular implementation,the rolling hash function H1 may be a Karp-Rabin function.

In accordance with an embodiment of the invention, a first filter may beapplied 206 to the set of hash values h. For example, in oneimplementation, the first filter may select those hash values h whichsatisfy h=0 mod p, where p is a pre-defined prime number. The hashvalues selected by the first filter form a set of candidate anchoringpoints. Example candidate anchoring points are designated in FIG. 3 bythe letter A in the array.

In accordance with the method 200 shown in FIG. 2, the documentfingerprints may then be generated 208 based on the entire set ofcandidate anchoring points. In other words, the entire set of candidateanchoring points is selected or kept as anchoring points such that ahigher-density of fingerprints may be generated in this method 200.

For each anchoring point, a second hash function H2 may be used togenerate a hash value from the substring starting at that anchoringpoint. The size of the substring may be the same as the sliding hashwindow. The second hash function H2 is preferably a different hashfunction from the first hash function H1. Using two different hashfunctions advantageously reduces false positive caused by hashcollisions. The set of hash values generated by the second hash functionH2 may be output 210 as the document fingerprints for the document d.

Lower-Density Fingerprint Generation

FIG. 4 depicts a second method 400 for generating document fingerprintsin accordance with an embodiment of the invention. This second method400 may be applied by the lower-density fingerprint generator 136 usedby an FLMA 132.

Like the first method 200 of FIG. 2, the second method 400 of FIG. 4 mayalso normalize 202 the document, generate 204 a set of hash values fromthe normalized document using a first hash function, and apply 206 afirst filter to the set of hash values to select a set of candidateanchoring points.

Unlike the first method 200, the second method 400 applies 402 a secondfilter to the set of candidate anchoring points so as to select a set ofanchoring points. The set of anchoring points is a subset of the set ofcandidate anchoring points. In accordance with one embodiment of theinvention, applying 402 the second filter may involve dividing thenormalized text string in binary form into a plurality of N pieces, andselecting only one of the candidate anchoring points per piece. In anexemplary embodiment, the N pieces may be of equal binary size, exceptthat last piece (whose size depends on the remaining length of thestring for that last piece). FIG. 5 depicts the division of a normalizedtextual string into a number of pieces in accordance with an embodimentof the invention.

For any piece, if the piece contains multiple candidate anchoring points(previously selected by the application 204 of the first filter), thenthe candidate anchoring point closest to the centre of the piece isselected to be the anchoring point for that piece. FIG. 6 shows theselection of a single anchoring point (the rectangle filled in black)from a first piece in accordance with an embodiment of the invention.

Like in the first method 200, for each anchoring point, a second hashfunction H2 may be used to generate a hash value from the substringstarting at that anchoring point. The size of the substring may be thesame as the sliding hash window. The second hash function H2 ispreferably a different hash function from the first hash function H1 soas to advantageously reduce false positive caused by hash collisions.The set of hash values generated by the second hash function H2 may beoutput 210 as the document fingerprints for the document d.

Example Computer Apparatus and Mobile Device

Referring to FIG. 7, there is shown a schematic diagram of an examplecomputer apparatus 700 that may be used in embodiments of the presentinvention. For example, one or more computer apparatus 700 may beutilized in the enterprise network 110 to execute computer-readableprogram code configured to implement the enterprise fingerprint agent112. In addition, one or more computer apparatus 700 may be utilized inthe service cloud 120 to execute computer-readable program codeconfigured to implement the fingerprint match engine 124 and the DLPpolicy engine 126.

As shown in the figure, the computer may include a processor 701, suchas those from the Intel Corporation or Advanced Micro Devices, forexample. The computer may have one or more buses 703 coupling itsvarious components. The computer may include one or more input devices702 (e.g., keyboard, mouse, etc.), a display monitor 704 (e.g., LCD,cathode ray tube, flat panel display, etc.), a computer network orcommunications interface 705 (e.g., network adapters, wireless networkadapters, etc.) for communicating over a computer (data) network 709,one or more data storage devices 706 (e.g., hard disk drive, opticaldrive, FLASH memory, etc.) for storing computer-readable data ontocomputer-readable media and for reading the data therefrom, and a mainmemory 708 (e.g., DRAM, SRAM, etc.).

Computer-readable data (including computer-readable programinstructions) may be stored in the data storage devices 706 and may beloaded into main memory 708. Computer-readable data may also be receivedover the computer network 709 by way of a communications interface 705.In particular, the main memory 708 may loaded with programs 710(comprising computer-readable instruction code and data) which may beexecuted by the processor 701 to perform some of the functionalities andoperations as described herein.

Referring to FIG. 8, there is shown a schematic diagram of an examplesmart phone device 810 that may be used in embodiments of the presentinvention. For example, the smart phone 810 may be utilized to executecomputer-readable program code configured to implement thefingerprint-less mobile agent 132.

As shown in the figure, the smart phone 810 may include memory 813 forstoring data, at least one processor 814 for executing computer-readablecode, and communication circuits 815. The communication circuits 815 maybe configured for wireless data communications via a cellular phonenetwork. The memory 813 may hold the FLMA 132, and the processor 814 mayexecute the FLMA 132. The smart phone 810 may also include a file system112 which may include the target text data 131 which may be checked forsensitive data by the FLMA 132.

Spam Filtering Service with Document Fingerprinting

FIG. 9 depicts select operational elements of a system for providing aspam filtering service on smart phones in accordance with an embodimentof the invention. The elements for providing the spam filtering servicefor smart phones per FIG. 9 are similar to the elements for providingthe data leakage prevention service for smart phones per FIG. 1B.

In the system of FIG. 9, a spam filtering agent 912 generates documentfingerprints of spam samples 911 using a higher-density fingerprintgenerator 114. These fingerprints may be stored and indexed on a localfingerprint database 116. The service cloud 120 may receive databaseupdates 117 so as to keep the hosted fingerprint database 122 inrelatively close synchronization with the local fingerprint database116.

The smart phone 130 may include a target incoming message 931 which maybe filtered for spam by the FLMA (for spam filtering) 932. The FLMA 932may use the lower-density fingerprint generator 136 to generatefingerprints from the target incoming message 931. These fingerprints145 may be sent to the fingerprint match engine 124 of the service cloud120 for matching against the hosted spam fingerprint database 922.

Results of the matching may be provided to the spam policy engine 926.The spam policy engine 926 may apply policies to determine the feedback150 to return based on the matching results. The policies implemented bythe spam policy engine 926 may be customized for an organization or fora particular user. Based on the feedback 150, the smart phone mayperform actions, such as blocking the message and/or alerting the userof the smart phone or an administrator of an organization.

While specific embodiments of the present invention have been provided,it is to be understood that these embodiments are for illustrationpurposes and not limiting. Many additional embodiments will be apparentto persons of ordinary skill in the art reading this disclosure.

In the present disclosure, numerous specific details are provided, suchas examples of apparatus, components, and methods, to provide a thoroughunderstanding of embodiments of the invention. Persons of ordinary skillin the art will recognize, however, that the invention can be practicedwithout one or more of the specific details. In other instances,well-known details are not shown or described to avoid obscuring aspectsof the invention.

Being computer-related, it can be appreciated that some componentsdisclosed herein may be implemented in hardware, software, or acombination of hardware and software (e.g., firmware). Softwarecomponents may be in the form of computer-readable program code storedin a computer-readable storage medium, such as memory, mass storagedevice, or removable storage device. For example, a computer-readablestorage medium may comprise computer-readable program code forperforming the function of a particular component. Likewise, computermemory may be configured to include one or more components, which may beexecuted by a processor. Components may be implemented separately inmultiple modules or together in a single module.

What is claimed is:
 1. A method for providing a service which matches document fingerprints against a database of document fingerprints, the method comprising: obtaining target text data on a mobile phone device; generating target document fingerprints for the target text data using a fingerprint generator on the mobile phone device; transmitting the target document fingerprints from the mobile phone device to a service cloud; and receiving a feedback message by the mobile phone device from the service cloud, wherein the feedback message depends on results from matching the target document fingerprints against the database of document fingerprints, wherein the fingerprint generator on the mobile phone device comprises a lower-density fingerprint generator, and wherein indexed document fingerprints in the database of document fingerprints are generated by a higher-density fingerprint generator.
 2. The method of claim 1, wherein the target text data is obtained from an outgoing data communication from the mobile phone device.
 3. The method of claim 2, wherein the service cloud is configured to provide a data leakage prevention service to an enterprise network, and wherein the mobile phone device is outside of the enterprise network.
 4. The method of claim 1, wherein the target text data is obtained from an incoming data communication that has arrived at the mobile phone device.
 5. The method of claim 4, wherein the service cloud is configured to provide a spam filtering service.
 6. The method of claim 1, wherein the lower-density fingerprint generator generates the target document fingerprints by a multiple-filtering procedure, and wherein the higher-density fingerprint generator generates document fingerprints using a single-filtering procedure.
 7. The method of claim 1, wherein the lower-density fingerprint generator generates the target document fingerprints by a procedure comprising: normalizing target text data to create a normalized text string; applying a first hash function with a sliding hash window to the normalized text string to generate an array of hash values; applying a first filter to the array of hash values to select candidate anchoring points; applying a second filter to the candidate anchoring points to select anchoring points; and applying a second hash function to substrings located at the selected anchoring points to generate the target document fingerprints.
 8. The method of claim 1, wherein the higher-density fingerprint generator generates the document fingerprints by a procedure comprising: normalizing target text data to create a normalized text string; applying a first hash function with a sliding hash window to the normalized text string to generate an array of hash values; applying a first filter to the array of hash values to select candidate anchoring points; and applying a second hash function to substrings located at the candidate anchoring points to generate the document fingerprints.
 9. A mobile phone device comprising: communication circuits configured to receive and send data by way of a cellular telecommunications network; a data storage system, including memory, configured to store computer-readable instruction code and data; a processor configured to access the data storage system and to execute the computer-readable instruction code; the data storage system storing therein computer-readable instruction code configured to provide a fingerprint generator on the mobile phone, computer-readable instruction code configured to obtain target text data from the data storage system on the mobile phone device, computer-readable instruction code configured to generate target document fingerprints for the target text data using the fingerprint generator, computer-readable instruction code configured to transmit the target document fingerprints to a service cloud, and computer-readable instruction code configured to receive a feedback message from the service cloud, wherein the feedback message depends on results from matching the target document fingerprints against the database of document fingerprints, wherein the target text data is obtained from an outgoing data communication from the mobile phone device, wherein the service cloud is configured to provide a data leakage prevention service to an enterprise network, and wherein the mobile phone device further comprises a fingerprint-less mobile agent for the data leakage prevention service.
 10. The mobile phone device of claim 9, wherein the target text data is obtained from an incoming data communication that has arrived at the mobile phone device.
 11. The mobile phone device of claim 10, wherein the service cloud is configured to provide a spam filtering service, and wherein the mobile phone device further comprises a fingerprint-less mobile agent for the spam filtering service.
 12. The mobile phone device of claim 9, wherein the fingerprint generator is programmed to generate the target document fingerprints using a multiple-filtering procedure.
 13. The mobile phone device of claim 9, wherein the fingerprint generator is programmed to generate the target document fingerprints by a procedure comprising: normalizing target text data to create a normalized text string; applying a first hash function with a sliding hash window to the normalized text string to generate an array of hash values; applying a first filter to the array of hash values to select candidate anchoring points; applying a second filter to the candidate anchoring points to select anchoring points; and applying a second hash function to substrings located at the selected anchoring points to generate the target document fingerprints.
 14. A computer apparatus comprising: data storage device configured to store computer-readable instruction code and data; a processor configured to access the data storage device and to execute said computer-readable instruction code; computer-readable instruction code configured as an enterprise fingerprint agent for a cloud-based service; the data storage device storing therein computer-readable instruction code configured to generate document fingerprints using a higher-density fingerprint generation procedure which comprises: normalizing target text data to create a normalized text string; applying a first hash function with a sliding hash window to the normalized text string to generate an array of hash values; applying a first filter to the array of hash values to select candidate anchoring points; and applying a second hash function to substrings located at the candidate anchoring points to generate the document fingerprints.
 15. The computer apparatus of claim 14, wherein the cloud-based service comprises a data leakage prevention service.
 16. The computer apparatus of claim 14, wherein the cloud-based service comprises a spam filtering service. 